Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 75
Filter
Add filters

Journal
Year range
1.
Conference on Human Factors in Computing Systems - Proceedings ; 2023.
Article in English | Scopus | ID: covidwho-20245382

ABSTRACT

Large language models have abilities in creating high-volume human-like texts and can be used to generate persuasive misinformation. However, the risks remain under-explored. To address the gap, this work first examined characteristics of AI-generated misinformation (AI-misinfo) compared with human creations, and then evaluated the applicability of existing solutions. We compiled human-created COVID-19 misinformation and ed it into narrative prompts for a language model to output AI-misinfo. We found significant linguistic differences within human-AI pairs, and patterns of AI-misinfo in enhancing details, communicating uncertainties, drawing conclusions, and simulating personal tones. While existing models remained capable of classifying AI-misinfo, a significant performance drop compared to human-misinfo was observed. Results suggested that existing information assessment guidelines had questionable applicability, as AI-misinfo tended to meet criteria in evidence credibility, source transparency, and limitation acknowledgment. We discuss implications for practitioners, researchers, and journalists, as AI can create new challenges to the societal problem of misinformation. © 2023 Owner/Author.

2.
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations ; : 67-74, 2023.
Article in English | Scopus | ID: covidwho-20245342

ABSTRACT

In this demo, we introduce a web-based misinformation detection system PANACEA on COVID-19 related claims, which has two modules, fact-checking and rumour detection. Our fact-checking module, which is supported by novel natural language inference methods with a self-attention network, outperforms state-of-the-art approaches. It is also able to give automated veracity assessment and ranked supporting evidence with the stance towards the claim to be checked. In addition, PANACEA adapts the bi-directional graph convolutional networks model, which is able to detect rumours based on comment networks of related tweets, instead of relying on the knowledge base. This rumour detection module assists by warning the users in the early stages when a knowledge base may not be available. © 2023 Association for Computational Linguistics.

3.
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference ; : 2644-2656, 2023.
Article in English | Scopus | ID: covidwho-20243588

ABSTRACT

In automated scientific fact-checking, machine learning models are trained to verify scientific claims given evidence. A major bottleneck of this task is the availability of large-scale training datasets on different domains, due to the required domain expertise for data annotation. However, multiple-choice question-answering datasets are readily available across many different domains, thanks to the modern online education and assessment systems. As one of the first steps towards addressing the fact-checking dataset scarcity problem in scientific domains, we propose a pipeline for automatically converting multiple-choice questions into fact-checking data, which we call Multi2Claim. By applying the proposed pipeline, we generated two large-scale datasets for scientific-fact-checking: Med-Fact and Gsci-Fact for the medical and general science domains, respectively. These two datasets are among the first examples of large-scale scientific-fact-checking datasets. We developed baseline models for the verdict prediction task using each dataset. Additionally, we demonstrated that the datasets could be used to improve performance measured by weighted F1 on existing fact-checking datasets such as SciFact, HEALTHVER, COVID-Fact, and CLIMATE-FEVER. In some cases, the improvement in performance was up to a 26% increase. The generated datasets are publicly available. © 2023 Association for Computational Linguistics.

4.
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations ; : 35-42, 2023.
Article in English | Scopus | ID: covidwho-20234954

ABSTRACT

In recent years, COVID-19 has impacted all aspects of human life. As a result, numerous publications relating to this disease have been issued. Due to the massive volume of publications, some retrieval systems have been developed to provide researchers with useful information. In these systems, lexical searching methods are widely used, which raises many issues related to acronyms, synonyms, and rare keywrds. In this paper, we present a hybrid relation retrieval system, CovRelex-SE, based on embeddings to provide high-quality search results. Our system can be accessed through the following URL: https://www.jaist.ac.jp/is/labs/nguyen-lab/systems/covrelex-se/. © 2023 Association for Computational Linguistics.

5.
EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of System Demonstrations ; : 1-10, 2023.
Article in English | Scopus | ID: covidwho-20232037

ABSTRACT

Open-retrieval question answering systems are generally trained and tested on large datasets in well-established domains. However, low-resource settings such as new and emerging domains would especially benefit from reliable question answering systems. Furthermore, multilingual and cross-lingual resources in emergent domains are scarce, leading to few or no such systems. In this paper, we demonstrate a cross-lingual open-retrieval question answering system for the emergent domain of COVID-19. Our system adopts a corpus of scientific articles to ensure that retrieved documents are reliable. To address the scarcity of cross-lingual training data in emergent domains, we present a method utilizing automatic translation, alignment, and filtering to produce English-to-all datasets. We show that a deep semantic retriever greatly benefits from training on our English-to-all data and significantly outperforms a BM25 baseline in the cross-lingual setting. We illustrate the capabilities of our system with examples and release all code necessary to train and deploy such a system1 © 2023 Association for Computational Linguistics.

6.
Proceedings of the ACM on Human-Computer Interaction ; 7(CSCW1), 2023.
Article in English | Scopus | ID: covidwho-2315922

ABSTRACT

Artificial Intelligence (AI) is a transformative force in communication and messaging strategy, with potential to disrupt traditional approaches. Large language models (LLMs), a form of AI, are capable of generating high-quality, humanlike text. We investigate the persuasive quality of AI-generated messages to understand how AI could impact public health messaging. Specifically, through a series of studies designed to characterize and evaluate generative AI in developing public health messages, we analyze COVID-19 pro-vaccination messages generated by GPT-3, a state-of-the-art instantiation of a large language model. Study 1 is a systematic evaluation of GPT-3's ability to generate pro-vaccination messages. Study 2 then observed peoples' perceptions of curated GPT-3-generated messages compared to human-authored messages released by the CDC (Centers for Disease Control and Prevention), finding that GPT-3 messages were perceived as more effective, stronger arguments, and evoked more positive attitudes than CDC messages. Finally, Study 3 assessed the role of source labels on perceived quality, finding that while participants preferred AI-generated messages, they expressed dispreference for messages that were labeled as AI-generated. The results suggest that, with human supervision, AI can be used to create effective public health messages, but that individuals prefer their public health messages to come from human institutions rather than AI sources. We propose best practices for assessing generative outputs of large language models in future social science research and ways health professionals can use AI systems to augment public health messaging. © 2023 ACM.

7.
7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022 ; : 1-10, 2022.
Article in English | Scopus | ID: covidwho-2290872

ABSTRACT

Named Entity Recognition (NER) is a well-known problem for the natural language processing (NLP) community. It is a key component of different NLP applications, including information extraction, question answering, and information retrieval. In the literature, there are several Arabic NER datasets with different named entity tags;however, due to data and concept drift, we are always in need of new data for NER and other NLP applications. In this paper, first, we introduce Wassem, a web-based annotation platform for Arabic NLP applications. Wassem can be used to manually annotate textual data for a variety of NLP tasks: text classification, sequence classification, and word segmentation. Second, we introduce the COVID-19 Arabic Named Entities Recognition (CAraNER) dataset extracted from the Arabic Newspaper COVID-19 Corpus (AraNPCC). CAraNER has 55,389 tokens distributed over 1,278 sentences randomly extracted from Saudi Arabian newspaper articles published during 2019, 2020, and 2021. The dataset is labeled by five annotators with five named-entity tags, namely: Person, Title, Location, Organization, and Miscellaneous. The CAraNER corpus is available for download for free. We evaluate the corpus by finetuning four BERT-based Arabic language models on the CAraNER corpus. The best model was AraBERTv0.2-large with 0.86 for the F1 macro measure. © 2022 Association for Computational Linguistics.

8.
3rd Workshop on Figurative Language Processing, FigLang 2022, as part of EMNLP 2022 ; : 44-53, 2022.
Article in English | Scopus | ID: covidwho-2305386

ABSTRACT

Conceptual metaphors represent a cognitive mechanism to transfer knowledge structures from one onto another domain. Image-schematic conceptual metaphors (ISCMs) specialize on transferring sensorimotor experiences to domains. Natural language is believed to provide evidence of such metaphors. However, approaches to verify this hypothesis largely rely on top-down methods, gathering examples by way of introspection, or on manual corpus analyses. In order to contribute towards a method that is systematic and can be replicated, we propose to bring together existing processing steps in a pipeline to detect ISCMs, exemplified for the image schema SUPPORT in the COVID-19 domain. This pipeline consists of neural metaphor detection, dependency parsing to uncover construction patterns, clustering, and BERT-based frame annotation of dependent constructions to analyze ISCMs. © 2022 Association for Computational Linguistics.

9.
7th Arabic Natural Language Processing Workshop, WANLP 2022 held with EMNLP 2022 ; : 511-514, 2022.
Article in English | Scopus | ID: covidwho-2304479

ABSTRACT

Propaganda content has seen massive spread in the biggest social media networks. Major global events such as Covid-19, presidential elections, and wars have all been infested with various propaganda techniques. In participation in the WANLP 2022 Shared Task(Alam et al., 2022), this paper provides a detailed overview of our machine learning system for propaganda techniques classification and its achieved results. The task was carried out using pre-trained transformer based models: ARBERT and MARBERT. The models were fine-tuned for the downstream task in hand: multilabel classification of Arabic tweets. According to the results, MARBERT and ARBERT attained 0.562 and 0.567 micro F1-score on the development set of subtask 1. The submitted model was MARBERT which attained a 0.597 micro F1-score and got the fifth rank. © 2022 Association for Computational Linguistics.

10.
5th International Conference on Artificial Intelligence in Information and Communication, ICAIIC 2023 ; : 429-434, 2023.
Article in English | Scopus | ID: covidwho-2299037

ABSTRACT

Ahstract-SARS-CoV-2 virus has long been evolving posing an increased risk in terms of infectivity and transmissibility which causes greater impact in communities worldwide. With the surge of collected SARS-CoV-2 sequences, studies found out that most of the emerging variants are linked to increased mutations in the spike (S) protein as observed in Alpha, Beta, Gamma, and Delta variants. Multiple approaches on genomic surveillance have been performed to monitor the mutational status and spread of the virus however most are heavily dependent on labels attributed to these sequences. Hence, this study features a system that has the capability to learn the protein language model of SARS-CoV-2 spike proteins, based on a bidirectional long-short term memory (BiLSTM) recurrent neural network, using sequence data alone. Upon obtaining the sequence embedding from the model, observed clusters are generated using the Leiden clustering algorithm and is visualized to monitor similarities between variants in terms of grammatical probability and semantic change. Additionally, the system measures the validity of a user-generated next-generation sequence capturing potential sequence mutations indicative of viral escape, particularly mutations by substitutions. Further studies on methods uncovering semantic rules that govern spike proteins are recommended to learn more about other viral characteristics conclusive of the future of the COVID-19 pandemic. © 2023 IEEE.

11.
1st International Conference on Machine Learning, Computer Systems and Security, MLCSS 2022 ; : 301-306, 2022.
Article in English | Scopus | ID: covidwho-2294226

ABSTRACT

The COVID-19 pandemic has been accompanied by such an explosive increase in media coverage and scientific publications that researchers find it difficult to keep up. So we are working on COVID-19 dataset on Omicron variant to recognise the name entity from a given text. We collect the COVID related data from newspaper or from tweets. This article covered the name entity like COVID variant name, organization name and location name, vaccine name. It include tokenisation, POS tagging, Chunking, levelling, editing and for run the program. It will help us to recognise the name entity like where the COVID spread (location) most, which variant spread most (variant name), which vaccine has been given (vaccine name) from huge dataset. In this work, we have identified the names. If we assume unemployment, economic downfall, death, recovery, depression, as a topic we can identify the topic names also, and in which phase it occurred. © 2022 IEEE.

12.
2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 ; : 371-378, 2022.
Article in English | Scopus | ID: covidwho-2275310

ABSTRACT

We recently introduced DRaiL, a declarative neuro-symbolic modeling framework designed to support a wide variety of NLP scenarios. In this demo, we enhance DRaiL with an easy to use Python interface equipped with methods to define, modify and augment models interactively, as well as with methods to debug and visualize the predictions made. We demonstrate this interface with two challenging NLP tasks: analyzing moral sentiment in political discourse, and analyzing opinions about the Covid-19 vaccine. © 2022 Association for Computational Linguistics.

13.
60th Annual Meeting of the Association for Computational Linguistics, ACL 2022 ; 1:2736-2749, 2022.
Article in English | Scopus | ID: covidwho-2274256

ABSTRACT

News events are often associated with quantities (e.g., the number of COVID-19 patients or the number of arrests in a protest), and it is often important to extract their type, time, and location from unstructured text in order to analyze these quantity events. This paper thus formulates the NLP problem of spatiotemporal quantity extraction, and proposes the first meta-framework for solving it. This meta-framework contains a formalism that decomposes the problem into several information extraction tasks, a shareable crowdsourcing pipeline, and transformer-based baseline models. We demonstrate the meta-framework in three domains-the COVID-19 pandemic, Black Lives Matter protests, and 2020 California wildfires-to show that the formalism is general and extensible, the crowdsourcing pipeline facilitates fast and high-quality data annotation, and the baseline system can handle spatiotemporal quantity extraction well enough to be practically useful. We release all resources for future research on this topic. © 2022 Association for Computational Linguistics.

14.
1st Workshop on NLP for COVID-19 at the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 ; 2020.
Article in English | Scopus | ID: covidwho-2271699

ABSTRACT

We present a simple NLP methodology for detecting COVID-19 misinformation videos on YouTube by leveraging user comments. We use transfer learning pre-trained models to generate a multi-label classifier that can categorize conspiratorial content. We use the percentage of misinformation comments on each video as a new feature for video classification. We show that the inclusion of this feature in simple models yields an accuracy of up to 82.2%. Furthermore, we verify the significance of the feature by performing a Bayesian analysis. Finally, we show that adding the first hundred comments as tf-idf features increases the video classifier accuracy by up to 89.4%. © ACL 2020.All right reserved.

15.
1st Workshop on NLP for COVID-19 at the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 ; 2020.
Article in English | Scopus | ID: covidwho-2268591

ABSTRACT

The spread of COVID-19 has become a significant and troubling aspect of society in 2020. With millions of cases reported across countries, new outbreaks have occurred and followed patterns of previously affected areas. Many disease detection models do not incorporate the wealth of social media data that can be utilized for modeling and predicting its spread. It is useful to ask, can we utilize this knowledge in one country to model the outbreak in another? To answer this, we propose the task of cross-lingual transfer learning for epidemiological alignment. Utilizing both macro and micro text features, we train on Italy's early COVID-19 outbreak through Twitter and transfer to several other countries. Our experiments show strong results with up to 0.85 Spearman correlation in cross-country predictions. © ACL 2020.All right reserved.

16.
2022 Findings of the Association for Computational Linguistics: EMNLP 2022 ; : 5610-5622, 2022.
Article in English | Scopus | ID: covidwho-2268403

ABSTRACT

Online discussions are abundant with opinions towards a common topic, and identifying (dis)agreement between a pair of comments enables many opinion mining applications. Realizing the increasing needs to analyze opinions for emergent new topics that however tend to lack annotations, we present the first meta-learning approach for few-shot (dis)agreement identification that can be quickly applied to analyze opinions for new topics with few labeled instances. Furthermore, we enhance the meta-learner's domain generalization ability from two perspectives. The first is domain-invariant regularization, where we design a lexicon-based regularization loss to enable the meta-learner to learn domain-invariant cues. The second is domain-aware augmentation, where we propose domain-aware task augmentation for meta-training to learn domain-specific expressions. In addition to using an existing dataset, we also evaluate our approach on two very recent new topics, mask mandate and COVID vaccine, using our newly annotated datasets containing 1.5k and 1.4k SubReddits comment pairs respectively. Extensive experiments on three domains/topics demonstrate the effectiveness of our meta-learning approach. © 2022 Association for Computational Linguistics.

17.
1st Workshop on NLP for COVID-19 at the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 ; 2020.
Article in English | Scopus | ID: covidwho-2267317

ABSTRACT

Social media data can be a very salient source of information during crises. User-generated messages provide a window into people's minds during such times, allowing us insights about their moods and opinions. Due to the vast amounts of such messages, a large-scale analysis of population-wide developments becomes possible. In this paper, we analyze Twitter messages (tweets) collected during the first months of the COVID-19 pandemic in Europe with regard to their sentiment. This is implemented with a neural network for sentiment analysis using multilingual sentence embeddings. We separate the results by country of origin, and correlate their temporal development with events in those countries. This allows us to study the effect of the situation on people's moods. We see, for example, that lockdown announcements correlate with a deterioration of mood in almost all surveyed countries, which recovers within a short time span. © ACL 2020.All right reserved.

18.
5th International Conference on Algorithms, Computing and Artificial Intelligence, ACAI 2022 ; 2022.
Article in English | Scopus | ID: covidwho-2265590

ABSTRACT

Our work aims to generate new ideas to explore in a specific domain using generative language models. For example, doctors can write about known symptoms as cues to the system, and then the system will generate ideas based on the cues. Similar scenarios can be thought of for other scientific domains. We used transformer-based decoders, especially GPT3-based transformer decoders, as the language models and generators. As the data, we used COVID-19 open research dataset [18]. We finetuned GPT-NEO-125M and GPT-NEO-1.3B models with 125 million and 1.3 billion parameters, respectively. The later model generated more coherent text and could link ideas relevant to the same problem better. We report here our findings with examples generated from our finetuned models. © 2022 ACM.

19.
2022 Findings of the Association for Computational Linguistics: EMNLP 2022 ; : 4598-4611, 2022.
Article in English | Scopus | ID: covidwho-2258731

ABSTRACT

Recent research on argumentative dialogues has focused on persuading people to take some action, changing their stance on the topic of discussion, or winning debates. In this work, we focus on argumentative dialogues that aim to open up (rather than change) people's minds to help them become more understanding to views that are unfamiliar or in opposition to their own convictions. To this end, we present a dataset of 183 argumentative dialogues about 3 controversial topics: veganism, Brexit and COVID-19 vaccination. The dialogues were collected using the Wizard of Oz approach, where wizards leverage a knowledge-base of arguments to converse with participants. Open-mindedness is measured before and after engaging in the dialogue using a questionnaire from the psychology literature, and success of the dialogue is measured as the change in the participant's stance towards those who hold opinions different to theirs. We evaluate two dialogue models: a Wikipedia-based and an argument-based model. We show that while both models perform closely in terms of opening up minds, the argument-based model is significantly better on other dialogue properties such as engagement and clarity. © 2022 Association for Computational Linguistics.

20.
2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 ; : 9436-9453, 2022.
Article in English | Scopus | ID: covidwho-2288454

ABSTRACT

Crises such as the COVID-19 pandemic continuously threaten our world and emotionally affect billions of people worldwide in distinct ways. Understanding the triggers leading to people's emotions is of crucial importance. Social media posts can be a good source of such analysis, yet these texts tend to be charged with multiple emotions, with triggers scattering across multiple sentences. This paper takes a novel angle, namely, emotion detection and trigger summarization, aiming to both detect perceived emotions in text, and summarize events and their appraisals that trigger each emotion. To support this goal, we introduce COVIDET (Emotions and their Triggers during Covid-19), a dataset of ~1, 900 English Reddit posts related to COVID-19, which contains manual annotations of perceived emotions and abstractive summaries of their triggers described in the post. We develop strong baselines to jointly detect emotions and summarize emotion triggers. Our analyses show that COVIDET presents new challenges in emotion-specific summarization, as well as multi-emotion detection in long social media posts. © 2022 Association for Computational Linguistics.

SELECTION OF CITATIONS
SEARCH DETAIL